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Abstract 

There exists a range of different models for estimating and simulat- 
ing credit risk transitions to optimally manage credit risk portfolios and 
products. In this chapter we present a Coupled Markov Chain approach 
to model rating transitions and thereby default probabilities of companies. 
As the likelihood of the model turns out to be a non-convex function of 
the parameters to be estimated, we apply heuristics to find the ML esti- 
mators. To this extent, we outline the model and its likelihood function, 
and present both a Particle Swarm Optimization algorithm, as well as 
an Evolutionary Optimization algorithm to maximize the likelihood func- 
tion. Numerical results are shown which suggest a further application of 
evolutionary optimization techniques for credit risk management. 

1 Introduction 

Credit risk is one of the most important risk categories managed by banks. 
Since the seminal work of |13| a lot of research efforts have been put into 
the development of both sophisticated and applicable models. Further- 
more, de facto standards like CreditMetrics and CreditRisk + exist. Nu- 
merous textbooks provide an overview of the set of available methods, 
see e.g. [3], [T2], and [14] . Evolutionary techniques have not yet been 
applied extensively in the area of credit risk management - see e.g. [5] for 
credit portfolio dependence structure derivations or [TS] for optimization 
of transition probability matrices. In this chapter, we apply the Coupled 
Markov Chain approach introduced by [9] and provide extensions to the 
methods presented in [8]. Section briefly describes the Coupled Markov 
Chain model and its properties, and outlines the data we used for sub- 
sequent sampling. The likelihood function, which is to be maximized is 
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discussed in Section [3] A non-trivial method to sample from the space of 
feasible points for the parameters is outlined in Section [3] Two different 
evolutionary approaches to optimize the maximum likelihood function are 
presented: in Section[5]a Particle Swarm Algorithm is shown, and Section 
[5] introduces an Evolutionary Optimization approach. Section [7] provides 
numerical results for both algorithmic approaches, while Section [H] con- 
cludes the chapter. 

2 Coupled Markov Chain Model 
2.1 Model Description 

In the Coupled Markov Chain model proposed in [9] company defaults 
are modeled directly as Bernoulli events. This is in contrast to standard 
models used in the literature where indirect reasoning via asset prices is 
used to model default events of companies. The advantage of the pro- 
posed approach is that there are no heavy model assumptions necessary 
(normality of asset returns, no transaction costs, complete markets, con- 
tinuous trading . . . ) . 

Portfolio effects in structured credit products are captured via correla- 
tions in default events. Companies are characterized by their current rat- 
ing class and a second characteristic which can be freely chosen (industry 
sector, geographic area, ...). This classification scheme is subsequently 
used to model joint rating transitions of companies. We keep the basic 
idea of the standard Gaussian Copula model 

X = p T + (1 - p)<j>, 

where r is the idiosyncratic part and <j) is the systematic part determining 
the rating transition, while < p < 1 is a relative weighting factor. More 
specifically the Coupled Markov Chain model can be described as follows: 
A company n belongs to a sector s(n) and is assigned to a rating class X l n 
at time t with X* € {0, . . . , M + 1} and t : 1 < t < T, with the credit 
quality decreasing with rating classes, i.e. (M + 1) being the default class, 
while 1 is the rating class corresponding to the best credit quality. The 
ratings of company n are modeled as Markov Chains X*. The rating 
process of company n is determined by 

• an idiosyncratic Markov Chain £„. 

• a component rf n which links n to other companies of the same rating 
class. 

• Bernoulli switching variables 8^ which decide which of the two fac- 
tors determines the rating, with P(<5^ +1 = 1) = g s ( n ),x* , i-e. the 
probability of success depends on sector and rating. 

All the f„ and 8^ are independent of everything else, while the rf n have 
a non-trivial joint distribution modeled by common Bernoulli tendency 
variables \i, i : 1 < i < M, such that 

< = P(x x «-i = 1) and P(»7* > Xi) = P(% x t-i = 0), 
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0.6009 
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0.0009 

0.0016 

0.0153 

0.2131 

1 / 



Table 1: Estimated rating transition probabilities 



i.e. the variables \i are indicators for a (common) non-deteriorating move 
of all the companies in rating class i. The rating changes of companies in 
different rating classes are made dependent by the non-trivial probability 
mass function P x : {0, 1} — > R of the vector \ = (Xii • • ■ j Xm)- 
The Coupled Markov Chain model is of the form: 

Xl = S^^X + (1 - <5n)??n- 

and exhibits properties, which are interesting for practical application. It 
takes a transition matrix P — (pij) as input which governs the probability 
of transitions for £ n and rji , i.e. 

P(£n = j) = Pm(»)j and P(rit = j) = p itj . 

The model is capable of capturing different default correlations for differ- 
ent sectors and rating classes, and is able to give a more accurate picture 
of closeness to default than the standard model by including more than 
two states. The overall transition probabilities of X n again follow P, i.e. 

= j) =Pm(n),j- 

2.2 Data 

Rating data from Standard & Poors has been used, whereby 10166 com- 
panies from all over the world have been considered. The data consists of 
yearly rating changes of these companies over a time horizon of 23 years 
up to the end of 2007. In total a number of 87.296 data points was used. 
The second characteristic is the SIC industry classification code. Sectoral 
information has been condensed to six categories: Mining and Construc- 
tion (1), Manufacturing (2), Transportation, Technology and Utility (3), 
Trade (4), Finance (5), Services (6). Likewise, rating classes are merged in 
the following way: AAA, AA ->- 1, A -» 2, BBB -> 3, BB, B -> 4, CCC, 
CC, C — * 5, D — > 6. These clusters allow for a more tractable model 
by preserving a high degree of detail. The estimated rating transition 
probabilities from the data are shown in Tab. [T] 
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3 Maximum Likelihood Function 

The approach proposed by [9] takes a Markov transition matrix P — 
(Pmi,m 2 )i<mi,m 2 <(M+i) as an input, i.e. 

M+l M+l 

Pi,m = ^2 Pm,i = 1, Vm : 1 < m < (M + 1). 

For (M + 1) rating classes, N companies and S industry sectors the 
parameters of the model are a matrix Q = (qm,a)i< s <s, i<m<M and a 
probability measure P x on {0, 1} satisfying some constraints dependent 
on P (see problem (0). Given rating transition data X ranging over T 
time periods we maximize the following monotone transformation of the 
likelihood function of the model 

L(X;Q,P x ) = Y d log ( P x(x=x) II f(^\s,m 1 ,m 2 ,;Q,P x 

t = 2 \xe{0,l} M s,m 1 ,m 2 



with 



f(x ,s,mi,m2,;Q,P x 



— ^ , mi > m 2 , Xmi 



mi < ra 2 , x mi = 



,gl,, s , otherwise. 

where /* = J t (mi,m2,s) is a function dependent on the data X which 
takes values in N, p+ = YJILi 2W and = 1 - p+ . 

The above function is clearly non-convex and since it consists of a 
mix of sums and products this problem can also not be overcome by a 
logarithmic transform. Maximizing the above likelihood for given data X 
in the parameters P x and Q amounts to solving the following constrained 
optimization problem 

maxQ,p x L(X;Q,P X ) 

S-t q m ,s G [0, 1] 

E^i-Px(x) =pl t , Vi:l<i<M W 



4 Sampling Feasible Points 

To sample from the space of feasible joint distributions for \ (i.e. the 
distributions whose marginals meet the requirements in JT])), first note 
that the distributions P x of the random variable \ are distributions on 

the space {0, 1} M and therefore can be modeled as vectors in R 2 . To 
obtain samples we proceed as follows. 
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1. To get a central point in the feasible region, we solve the following 
problem in dependence of a linear functional 9 : R — * R. 

ma XPx *(P X ) 

s.t. q m ,s £ [0, 1] 

J2^. =1 P X (X) =pt v Vt:l<i<M ^ 

E^^o-PxCx) =i-p+. 

and call the solution set 5(^1/). By generating linear $ functionals 
with random coefficients and picking x + £ <S(^) and x~ £ <S(— \t ) 
we get vertices of the feasible set of distributions for \ modeled as 
a polyhedron in R .In this way we generate a set of vertices V 
for the feasible region Q of the above problem. Note that to enforce 
all the constraints in ([2]) we need M + 1 linear equality constraints 
which describe a 2 M — (M + 1) dimensional affine subspace in R 2 . 

2. Get a central point in c £ by defining 

3. Sample K £ N directions of unit length from a spherical distribu- 
tion (like the multivariate standard normal with independent com- 
ponents) in R 2 _M_1 to get uniformly distributed directions. Map 
these directions to R using an orthogonal basis of the affine sub- 
space A of R 2 described by the last set of constraints in to 
obtain a set of directions T> in A. 

4. For every d £ X> determine where the line c+\d meets the boundary 
of the feasible set of p|. Call the length of the found line segment 
Id- 

5. Fix a number L £ N and sample 



points on the line Id- In this way we get approximately KL samples 



for P x . 



Contrary to obtaining samples from P x , getting suitable samples for 
Q is fairly easy, since all the components are in [0, I] and independent of 
each other. Note that both the set of feasible matrices Q as well as the 
set of feasible measures P x are convex sets in the respective spaces. 



5 Particle Swarm Algorithm 

In the following we give a brief description of the Particle Swarm Algo- 
rithm (PSA), which follows the ideas in [10J . 

1. Choose 5 > and S £ N. 
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2. Generate 5* permissible random samples Xk = (Q k ,P x ) for k = 
1, ...,S as described above, i.e. q k m G [Oil] an< A is consistent 
with the constraints in (J5J. Each sample is a particle in the algo- 
rithm. Set Xk = Xft and = for all k = 1, . . . , S. 

3. Set g *— argmnifc L(xk). 

4. For all particles Xk 

(a) Let the particles fly by first computing a velocity for the fc-th 
particle 

Vk <— co^fc + cm o - Xfc) + c 2 r- 2 o (<? - x k ) (3) 

where co, ci, c 2 are fixed constants, ri and r 2 are random matri- 
ces (component- wise uniform) of the appropriate dimension and 
o is the Hadamard matrix multiplication. Then a new position 
for the particle is found by the following assignment 

Xk <— Xk + Vk- 

(b) If L(x k ) > L(xk) then x k <— x k - 

5. L(xk) > L(g) for some Xk, then g <— x^. 

6. If var(L(xk)) < 5 terminate the algorithm, otherwise go to step 3. 

The main idea is that each particle k knows its best position Xk as of yet 
(in terms of the likelihood function) and every particle knows the best 
position ever seen by any particle g. The velocity of the particle changes 
in such a way that it is drawn to these positions (to a random degree). 
Eventually all the particles will end up close to one another and near to 
a (local) optimum. 

Note that in step 4(a) of the above algorithm, a particle may leave 
the feasible region either by violating the constraints on P x or Q. In 
this case the particle bounces of the border and completes it's move in 
the modified direction. To be more precise: the particles can only leave 
the feasible region, by violating the constraints that either elements of 
Q or probabilities assigned by P x are no longer in [0, 1] (the last two 
constraints in ([2]) can not be violated since the particles only move in the 
affme subspace of R 2 where these constraints are fulfilled) . 

If x\ + v l k > 1 for some 1 < i < 2 2 " + M S (the case where x\ + wj < 
works analogously), determine the maximum distance A that the particle 
can fly without constraint violation, i.e. set 

A= (^4) 

and set x k <— x k + \v k . Now set Vk such that the new velocity makes the 
particle bounce off the constraint as would be expected and make the rest 
of the move, i.e. set 

X <— X + (1 — \)V . 

In the case that the violation concerns an element of Q the modification 
only concerns a change of sign, i.e. v k < v\ and v J k <— v J k for all j ^ i. 
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In the following we describe how to find a bounce off direction, if a con- 
straint on an element of P x is violated: first determine the hyperplane H in 

2 M ( 2 M 1 

R that represents the constraint. Notice that H = < x £ R : Xi = 1 > 
for some i. We use the following Lemma to get a representation of the 
hyperplane H in the affine subspace A. 

Lemma Let A be an affine subspace of M. D with orthonormal basis 
ei, . . . , ed with D < d and H a hyperplane with normal vector n. The 
normal vector of the hyperplane A (~1 H in A is 



d 

Ti 

i=l 



Proof Let H = {x e R D : (x, n) = c} for some cel. then 

Id \ d 

c = (y,n) = (^2(y,ei)ei,n) =^2(y,e i )(e i ,n) 



\i=l / i = l 

y,^2{ei,n)ei \ = (y,n). 
i=i / 

This implies that the point y £ A n H , iff (y, n) — c. 

Using the above Lemma we identify the normal vector n G A of the 
hyperplane H — H n A in A. Without loss of generality we assume 
that ||n|| = 1. Now use Gram-Schmidt to identify a orthonormal system 
n, 2/2, ... , y 22 M _ (M+1) m A and represent v k as 

v k = (n,v k )n + ^2{yi,v k )yi. 

The transformed velocity can now be found as 

v k = -{n,Vk}n+}^{yi,Vk}yi- 

i>2 

Obviously an implementation of the algorithm has to be able to handle 
multiple such bounces in one move (i.e. situation where the new direction 
Vk again leads to a constraint violation) . Since the details are straightfor- 
ward and to avoid too complicated notation, we omit them here for the 
sake of brevity. 



6 Evolutionary Algorithm 

Evolutionary algorithms are well suited to handle many financial and 
econometric applications, see especially [2], [3], and [I] for a plethora 
of examples. 

Each chromosome consists of a matrix Q and a vector P x . While the 
parameters in Q can be varied freely between and 1, and the parameters 
P x do need to fulfill constraints (see above), the genetic operators involv- 
ing randomness are mainly focused around the matrix Q. Therefore, four 
different genetic operators are used: 
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• Elitist selection. A number e of the best chromosomes are added 
to the new population. 

• Intermediate crossover, c intermediate crossovers (linear inter- 
polation) between the matrix Qi and Q2 of two randomly selected 
parents are created using a random parameter A between and 1, 
i.e. two children Q3,P x ,a and Q4,P Xt 4 are calculated as follows: 

Q 3 = AQi + (1 - A)Q 2 ,P X , 3 = P xA , 

Qi = (1 - A)Qi + \Q 2 , P x ,4 = P x a- 

• Mutation, m new chromosomes are added by mutating a ran- 
domly selected chromosome from the parent population, and adding 
a factor <f> in the range [—0.5, 0.5] to the matrix Q. The values are 
truncated to values between and 1 after the mutation. 

• Random additions, r random chromosomes are added with a 
random matrix Q and a randomly selected vector P x from the parent 
population. 

7 Numerical Results 

Both algorithms were developed in MatLab R2007a, while the linear prob- 
lems ([2} were solved using MOSEK 5. A stability test has been conducted 
to validate the results of both optimization algorithms: the maximum 
(pointwise) differences of parameter estimates P x and Q between the dif- 
ferent optimization runs is used to verify that these important parameters, 
which are e.g. used for a credit portfolio optimization procedure, do not 
differ significantly. 

7.1 Particle Swarm Algorithm 

The parameters in © were set to Co = 0.5, ci = 1.5 and C2 = 1.5. The 
algorithm was made to complete 150 iterations with around 200 initial 
samples (where the x are simulated on 40 lines with approximately 5 
samples each) . To test the stability of the algorithm 50 runs of the algo- 
rithm were performed. As can be clearly observed in Fig. [T]the likelihood 
of the best particle as well as the mean likelihood of the swarm converges 
nicely and stabilizes around iteration 25. 

Each run took around 1 hour to complete 150 iterations. Stability 
results are shown in Fig. [3] 

The variances of the populations in every iterations are plotted in 
Figure[5] Since the variances sharply increase from very high initial values 
and in most cases drop to rather low values quickly the plot depicts the 
variance after applying a logarithmic transformation (base 10) as well as 
the mean variance over all the runs. While in most runs the variances 
decreases from values of the magnitude 10 5 to the range of 10 , some 
of the runs end up with significantly lower and higher variances. The 
latter being a sign that the PSA sometimes fails to converge, i.e. the 
particles do not concentrate at one point after the performed number of 
iterations. However, this is not problematic since we are only interested 



8 




Figure 1: Objective function of the PSA algorithm: maximum per iteration. 




Figure 2: Objective function of the PSA algorithm: population mean. 
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Figure 3: Maximum (pointwise) differences of parameter estimates P x for dif- 
ferent runs for the PSA. 
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Figure 4: Maximum (pointwise) differences of parameter estimates Q for differ- 
ent runs for the PSA. 
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Figure 5: Variances of the objective values of the swarm (variances are trans- 
formed with x i— > logio(x) for better interpretability of the results. The mean 
(logarithmic) variance is depicted by the bold line. 



in the likelihood of the best particle which seems to be pretty stable at 
around 400. The results depicted in Figure [3] confirm, that despite the 
high variance in some of the runs the likelihood of the best particle remains 
stable for all the runs. 

7.2 Evolutionary Algorithm 

The following parameters have been used to calculate results: The number 
of maximum iterations has been set to 150. Each new population consists 
of e = 30 elitist chromosomes, c = 50 intermediate crossovers, m = 100 
mutations, and r = 50 random additions. 50 runs have been calculated, 
and 10 tries out of these are shown in Fig. [6]- both the maximum objective 
value per iteration (left) as well as the population mean (right). Due 
to the high number of random additions, mutations and crossovers, the 
mean is relatively low and does not significantly change over the iterations, 
which does not influence the results. The initial population size were 750 
randomly sampled chromosomes, independently sampled for each try. It 
can be clearly seen, that due to the different algorithmic approach, the 
convergence is different from the PSA. 

Each run took approximately 70 minutes to complete the 150 itera- 
tions. Stability results are shown in Fig. 17.21 The population variance is 
shown in Fig. 1101 and clearly exhibits a different behavior than the PSA 
algorithm as expected. 



11 




Figure 6: Objective function of the EA: maximum per iteration. 




Figure 7: Objective function of the EA: population mean. 
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Figure 8: Maximum (pointwise) differences of parameter estimates P x for dif- 
ferent runs for the EA. 
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Figure 9: Maximum (pointwise) differences of parameter estimates Q for differ- 
ent runs for the EA. 
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Figure 10: Population variance of the EA. 



7.3 Comparison of Methods 

Comparing the results of the current implementations of the two optimiza- 
tion heuristics the results found by the PSA consistently yield a higher 
objective value than the solutions obtained with the EA (for the best 
particle/chromosome as well as for the mean). The computing time for 
the two methods is similar and is mainly used for the expensive objective 
function evaluations. Furthermore the presented computational evidence 
shows the typical behavior of the variance given the two heuristic opti- 
mization techniques. While the PSA generally performs slightly better 
than the EA, it might well be that it gets stuck in a local optimum, which 
might be avoided using the EA. One can see from the figures that the 
maximum difference between the estimated parameters for different runs 
are smaller on average for the PSA. However, the analysis of the distribu- 
tion of these differences reveals the interesting fact, that while for the EA 
the differences are more uniform in magnitude and the highest as well as 
the lowest deviations can be observed for the PSA. With the realistically 
sized data set both methodologies are well suited and the final choice is 
up to the bank or company which implements and extends the presented 
method, i.e. has to be based on the expertise available. 

7.4 Application of the Model 

Once the Coupled Markov Chain model has been estimated using evolu- 
tionary techniques shown above, it can be used to simulate rating tran- 
sition scenarios for different sets of companies, which allows for pricing 
and optimization of various structured credit contracts like specific CDX 
tranches, e.g. a Mean-Risk optimization approach in the sense of [11] 
can be conducted for which evolutionary techniques can be used again as 
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shown by e.g. [6] and [7] , such that a whole credit risk management frame- 
work based on evolutionary techniques can be successfully implemented. 

8 Conclusion 

In this Chapter, we presented the likelihood function for a Coupled Markov 
Chain model for contemporary credit portfolio risk management. We pre- 
sented two different heuristic approaches for estimating the parameter of 
the likelihood function. Both are structurally different, i.e. the popula- 
tion mean of each method differs significantly. However, both are valid 
approaches to estimate parameters. Once the parameters are estimated, 
many applications are possible. One prominent example is to generate 
scenarios for future payment streams implied by an existing portfolio of 
Credit Default Swap Indices (CDX) by Monte Carlo simulation. This al- 
lows for assessing the risk of the current position and price products which 
might be added to the portfolio in the future and thereby determine their 
impact on the overall exposure. 
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